Large Scale Text Mining Approaches for Information Retrieval and Extraction

نویسندگان

  • Patrice Bellot
  • Ludovic Bonnefoy
  • Vincent Bouvier
  • Frédéric Duvert
  • Young-Min Kim
چکیده

The issues for Natural Language Processing and Information Retrieval have been studied for long time but the recent availability of very large resources (Web pages, digital documents...) and the development of statistical machine learning methods exploiting annotated texts (manual encoding by crowdsourcing is a new major way) have transformed these fields. This allows not limiting these approaches to highly specialized domains and reducing the cost of their implementation. For this chapter, our aim is to present some popular text-mining statistical approaches for information retrieval and information extraction and to discuss the practical limits of actual systems that introduce challenges for future. P. Bellot (&) V. Bouvier Y.-M. Kim CNRS, Aix-Marseille Université, LSIS UMR 7296, Av. Esc. Normandie-Niemen, 13397, Marseille cedex 20, France e-mail: [email protected] V. Bouvier e-mail: [email protected] Y.-M. Kim e-mail: [email protected] L. Bonnefoy V. Bouvier F. Duvert iSmart, 565 rue M. Berthelot, 13851, Aix-en-Provence cedex 3, France e-mail: [email protected] F. Duvert e-mail: [email protected] L. Bonnefoy LIA, Université d’Avignon et des Pays de Vaucluse, Agroparc, 84911, Avignon cedex 9, France C. Faucher and L. C. Jain (eds.), Innovations in Intelligent Machines-4, Studies in Computational Intelligence 514, DOI: 10.1007/978-3-319-01866-9_1, Springer International Publishing Switzerland 2014 3

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Text Mining: Promises and Challenges

Text mining, also known as knowledge discovery from text, and document information mining, refers to the process of extracting interesting patterns from very large text corpus for the purposes of discovering knowledge. Text mining is an interdisciplinary field involving information retrieval, text understanding, information extraction, clustering, categorization, visualization, database technol...

متن کامل

Evaluation of Information Retrieval and Text Mining Tools on Automatic Named Entity Extraction

We will report evaluation of Automatic Named Entity Extraction feature of IR tools on Dutch, French, and English text. The aim is to analyze the competency of off-the-shelf information extraction tools in recognizing entity types including person, organization, location, vehicle, time, & currency from unstructured text. Within such an evaluation one can compare the effectiveness of different ap...

متن کامل

ارائه مدلی برای استخراج اطلاعات از مستندات متنی، مبتنی بر متن‌کاوی در حوزه یادگیری الکترونیکی

As computer networks become the backbones of science and economy, enormous quantities documents become available. So, for extracting useful information from textual data, text mining techniques have been used. Text Mining has become an important research area that discoveries unknown information, facts or new hypotheses by automatically extracting information from different written documents. T...

متن کامل

Facts from text: can text mining help to scale-up high-quality manual curation of gene products with ontologies?

The biomedical literature can be seen as a large integrated, but unstructured data repository. Extracting facts from literature and making them accessible is approached from two directions: manual curation efforts develop ontologies and vocabularies to annotate gene products based on statements in papers. Text mining aims to automatically identify entities and their relationships in text using ...

متن کامل

Using biomedical databases as knowledge sources for large-scale text mining

In this paper we discuss how terminological knowledge extracted from biomedical databases can be used effectively in large-scale processing of the biomedical literature. We briefly present an integrated information extraction and text mining environment which is capable of reliably identifying and disambiguating several categories of relevant domain entities, which can then constitute relevant ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2014